feat: add PyTorch setup script for modular installation#117
feat: add PyTorch setup script for modular installation#117cmagina merged 6 commits intoredhat-et:mainfrom
Conversation
e09eb4a to
d5c4916
Compare
📝 WalkthroughWalkthroughAdds a new PyTorch installation script and integrates it into developer tooling: CI workflow path triggers, Docker images, Makefile container options, and devsetup/devinstall orchestration are updated to delegate PyTorch installation to the new installer. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer / CI trigger
participant CI as GitHub Actions
participant Image as Dockerfile build
participant Container as Run container (Makefile)
participant DevSetup as devsetup/devinstall_triton
participant TorchInstaller as devinstall_torch
participant PyTorchRepo as PyTorch repo / PyPI
Dev->>CI: push change (includes devinstall_torch.sh)
CI->>Image: build images (Dockerfiles copy script)
Dev->>Container: run_container (Makefile) sets INSTALL_TORCH / mounts torch_path
Container->>DevSetup: run devsetup.sh / devinstall_triton.sh
DevSetup->>TorchInstaller: invoke devinstall_torch (release/nightly/source/test)
TorchInstaller->>PyTorchRepo: clone/build or pip install wheels
TorchInstaller-->>DevSetup: install complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
294eac2 to
f61a04f
Compare
96a56df to
67d97a4
Compare
scripts/devinstall_torch.sh
Outdated
| git submodule update --init --recursive | ||
|
|
||
| if [ -n "${TORCH_GITREF:-}" ]; then | ||
| git checkout "" |
There was a problem hiding this comment.
looks like TORCH_GITREF might be missing here
There was a problem hiding this comment.
Wow, that is quite a miss, thanks.
scripts/devinstall_torch.sh
Outdated
| if [ -f requirements.txt ]; then | ||
| pip_install --group dev | ||
| pip_install mkl-static mkl-include | ||
| make triton |
There was a problem hiding this comment.
Doesn't this cause us to install triton twice due to the preceding devinstall_triton.sh?
There was a problem hiding this comment.
Those are the instructions from pytorch for installing build deps.
https://github.com/pytorch/pytorch?tab=readme-ov-file#install-dependencies
Although, re-reading the comment, that one should be left to the dev. Will remove it.
Makefile
Outdated
| INSTALL_NSIGHT ?=false | ||
| user_path ?= | ||
| torch_path ?= | ||
| user_path ?= |
hinriksnaer
left a comment
There was a problem hiding this comment.
left a couple of comments
Add scripts/setup_torch.sh to support PyTorch installation and configuration within containers. This script: - Downloads PyTorch source from GitHub when not mounted as a volume - Installs build dependencies for PyTorch compilation - Supports installing PyTorch wheels from PyPI (release, nightly, test) - Provides flexible configuration via INSTALL_TORCH environment variable The script supports multiple installation modes: - source: Build from source (with auto-download if not mounted) - release/nightly/test: Install wheels from PyPI - skip: Skip PyTorch installation This is part of the modular script architecture introduced in PR redhat-et#115. Signed-off-by: Craig Magina <cmagina@redhat.com>
d72cc14 to
0964c2e
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
scripts/devsetup.sh (1)
72-74:⚠️ Potential issue | 🟠 MajorAvoid potential double Torch installation in setup flow.
This new direct call can duplicate Torch installation if
scripts/devinstall_triton.shalready handlesINSTALL_TORCHfor the same run path, causing unnecessary setup time and possible environment drift.Suggested guard (if Triton install owns Torch setup)
-if [ "${INSTALL_TORCH:-skip}" != "skip" ]; then +if [ "${INSTALL_TORCH:-skip}" != "skip" ] && [ "${INSTALL_TRITON:-skip}" = "skip" ]; then run_as_user devinstall_torch "$INSTALL_TORCH" fi#!/bin/bash # Verify whether scripts/devinstall_triton.sh also installs or delegates Torch. # Expected: If this finds a direct torch install path, this block should be guarded to avoid duplicate execution. fd -a 'devinstall_triton.sh' rg -n -C3 '\bdevinstall_torch\b|\bINSTALL_TORCH\b' scripts/devinstall_triton.shAs per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/devsetup.sh` around lines 72 - 74, The script can double-install Torch because scripts/devinstall_triton.sh may already handle INSTALL_TORCH; update scripts/devsetup.sh so the devinstall_torch invocation is conditional: detect whether devinstall_triton.sh claims ownership (e.g., an exported flag like TRITON_HANDLES_TORCH or by grepping devinstall_triton.sh for INSTALL_TORCH/devinstall_torch) and only call run_as_user devinstall_torch "$INSTALL_TORCH" when that ownership flag is false or absent; ensure you reference INSTALL_TORCH and the run_as_user devinstall_torch invocation and add a clear comment explaining the guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/devinstall_torch.sh`:
- Around line 65-70: The script currently runs git submodule update before
applying TORCH_GITREF which can leave submodules pointing at the wrong commits;
change the order so that if TORCH_GITREF is set the script performs git checkout
"$TORCH_GITREF" first, then run git submodule sync and git submodule update
--init --recursive (or run git submodule sync before checkout and git submodule
update after checkout) to ensure submodules match the checked-out ref; update
the block around TORCH_GITREF, git checkout, git submodule sync, and git
submodule update to reflect this ordering.
- Around line 167-170: The current block only sets PIP_TORCH_INDEX_URL when
compute_platform is non-empty, which lets "nightly"/"test" builds fall back to
PyPI; update the logic around compute_platform/PIP_TORCH_INDEX_URL so that when
the requested channel is a non-stable one (detect via TORCH_VERSION containing
"nightly" or "test" or a TORCH_CHANNEL env var) you still set
PIP_TORCH_INDEX_URL (use pip_torch_index_url_base with compute_platform if
present, otherwise use pip_torch_index_url_base alone) and push it into
pip_install_args; change the code around the compute_platform check that
currently sets PIP_TORCH_INDEX_URL and pip_install_args to handle the
empty-compute_platform case for nightly/test to avoid resolving from default
PyPI.
- Around line 27-33: The current privilege check unconditionally exits for
non-root users without sudo by guarding SUDO, EUID and command -v sudo; change
it so non-root without sudo is allowed when doing wheel-only installs (e.g.,
install modes 'release', 'nightly', 'test') by first detecting the requested
install mode/argument and only enforcing the sudo requirement for modes that
need system package installation. Concretely, inspect the install mode
variable/argument used earlier in the script and wrap the existing
SUDO/EUID/command -v sudo logic so it runs only for non-wheel modes; keep the
SUDO variable, EUID check and command -v sudo usage but skip the exit branch for
wheel-only modes.
- Around line 87-90: The pip_install invocation uses an uv-specific flag
(--group dev) which will fail when pip_install falls back to standard pip;
update the block that checks for requirements.txt to call pip_install -r
requirements.txt (so it works with both uv and pip) and keep the subsequent
pip_install mkl-static mkl-include lines unchanged; locate the pip_install call
in the block that references requirements.txt and replace the uv-specific
--group dev usage with the portable -r requirements.txt form.
In `@scripts/devinstall_triton.sh`:
- Around line 108-115: The install block for Torch currently calls
devinstall_torch with whatever INSTALL_TORCH contains, causing failures when
INSTALL_TORCH=skip; update the conditional in the script around the
INSTALL_TORCH check to explicitly handle the "skip" value by not calling
devinstall_torch (e.g., treat "skip" like unset and bypass installation),
otherwise continue to call devinstall_torch with "release" or the provided valid
value; locate the block using the INSTALL_TORCH variable and the
devinstall_torch function name to implement this guard.
---
Duplicate comments:
In `@scripts/devsetup.sh`:
- Around line 72-74: The script can double-install Torch because
scripts/devinstall_triton.sh may already handle INSTALL_TORCH; update
scripts/devsetup.sh so the devinstall_torch invocation is conditional: detect
whether devinstall_triton.sh claims ownership (e.g., an exported flag like
TRITON_HANDLES_TORCH or by grepping devinstall_triton.sh for
INSTALL_TORCH/devinstall_torch) and only call run_as_user devinstall_torch
"$INSTALL_TORCH" when that ownership flag is false or absent; ensure you
reference INSTALL_TORCH and the run_as_user devinstall_torch invocation and add
a clear comment explaining the guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 78b7617d-1bfa-41aa-9a49-9d5273df6fab
📒 Files selected for processing (10)
.github/workflows/amd-image.yml.github/workflows/cpu-image.yml.github/workflows/nvidia-image.ymlMakefiledockerfiles/Dockerfile.tritondockerfiles/Dockerfile.triton-amddockerfiles/Dockerfile.triton-cpuscripts/devinstall_torch.shscripts/devinstall_triton.shscripts/devsetup.sh
| SUDO='' | ||
| if ((EUID != 0)) && command -v sudo &>/dev/null; then | ||
| SUDO="sudo" | ||
| elif ((EUID != 0)); then | ||
| echo "ERROR: $(basename "$0") requires root privileges or sudo." >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Privilege check is too broad and blocks wheel-only installs.
This exits early for non-root users without sudo even when running release/nightly/test, which don’t need system package installation.
Suggested fix
-SUDO=''
-if ((EUID != 0)) && command -v sudo &>/dev/null; then
- SUDO="sudo"
-elif ((EUID != 0)); then
- echo "ERROR: $(basename "$0") requires root privileges or sudo." >&2
- exit 1
-fi
+SUDO=''
+if ((EUID != 0)) && command -v sudo &>/dev/null; then
+ SUDO="sudo"
+fi install_build_deps() {
echo "Installing Torch build dependencies ..."
+ if ((EUID != 0)) && [ -z "${SUDO}" ]; then
+ echo "ERROR: $(basename "$0") requires root privileges or sudo for build dependencies." >&2
+ exit 1
+ fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/devinstall_torch.sh` around lines 27 - 33, The current privilege
check unconditionally exits for non-root users without sudo by guarding SUDO,
EUID and command -v sudo; change it so non-root without sudo is allowed when
doing wheel-only installs (e.g., install modes 'release', 'nightly', 'test') by
first detecting the requested install mode/argument and only enforcing the sudo
requirement for modes that need system package installation. Concretely, inspect
the install mode variable/argument used earlier in the script and wrap the
existing SUDO/EUID/command -v sudo logic so it runs only for non-wheel modes;
keep the SUDO variable, EUID check and command -v sudo usage but skip the exit
branch for wheel-only modes.
| if [ -f requirements.txt ]; then | ||
| pip_install --group dev | ||
| pip_install mkl-static mkl-include | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
tmp="$(mktemp -d)"
trap 'rm -rf "$tmp"' EXIT
git clone --depth 1 https://github.com/pytorch/pytorch.git "$tmp/pytorch"
echo "== Check PyTorch dependency metadata =="
if [ -f "$tmp/pytorch/pyproject.toml" ]; then
rg -n '^\[dependency-groups\]|^dev\s*=' "$tmp/pytorch/pyproject.toml" || true
fi
rg -n '^' "$tmp/pytorch/requirements.txt" | head -n 30 || true
echo "== Check local pip/uv support for --group =="
pip --help | rg -n -- '--group' || true
if command -v uv >/dev/null 2>&1; then
uv pip install --help | rg -n -- '--group' || true
fiRepository: redhat-et/triton-dev-containers
Length of output: 1826
🏁 Script executed:
cat -n scripts/devinstall_torch.shRepository: redhat-et/triton-dev-containers
Length of output: 7064
🏁 Script executed:
ls -la | head -20Repository: redhat-et/triton-dev-containers
Length of output: 1257
🏁 Script executed:
find . -name "requirements.txt" -type f 2>/dev/null | head -10Repository: redhat-et/triton-dev-containers
Length of output: 57
🏁 Script executed:
grep -n "pip_install" scripts/devinstall_torch.sh | head -20Repository: redhat-et/triton-dev-containers
Length of output: 522
🏁 Script executed:
grep -r "\-\-group" scripts/Repository: redhat-et/triton-dev-containers
Length of output: 128
🏁 Script executed:
grep -r "pip_install" . --include="*.sh" 2>/dev/nullRepository: redhat-et/triton-dev-containers
Length of output: 1858
--group dev flag incompatible with fallback to standard pip.
Line 88 uses pip_install --group dev, which relies on uv-specific syntax. Since pip_install falls back to standard pip when uv is unavailable (line 39), this command will fail with an unrecognized flag error if uv is not installed. The file check at line 87 suggests intent to install from requirements.txt, but the command should use -r requirements.txt for portability across both pip and uv.
Suggested fix
if [ -f requirements.txt ]; then
- pip_install --group dev
+ pip_install -r requirements.txt
pip_install mkl-static mkl-include
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ -f requirements.txt ]; then | |
| pip_install --group dev | |
| pip_install mkl-static mkl-include | |
| fi | |
| if [ -f requirements.txt ]; then | |
| pip_install -r requirements.txt | |
| pip_install mkl-static mkl-include | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/devinstall_torch.sh` around lines 87 - 90, The pip_install invocation
uses an uv-specific flag (--group dev) which will fail when pip_install falls
back to standard pip; update the block that checks for requirements.txt to call
pip_install -r requirements.txt (so it works with both uv and pip) and keep the
subsequent pip_install mkl-static mkl-include lines unchanged; locate the
pip_install call in the block that references requirements.txt and replace the
uv-specific --group dev usage with the portable -r requirements.txt form.
There was a problem hiding this comment.
Incorrect, pip install --group dev is supported by both.
There was a problem hiding this comment.
@cmagina, you're correct — my claim that --group is uv-specific was wrong. pip install --group support for PEP 735 Dependency Groups was added in pip 25.1, so it is indeed supported by both pip (25.1+) and uv.
That said, there's still a subtle issue worth noting: --group installs from pyproject.toml dependency groups, accepting arguments of the form group or path:group — it reads from pyproject.toml, not from requirements.txt. The surrounding if [ -f requirements.txt ] guard checks for a different file format entirely, so the two may not be logically consistent (the guard passes if requirements.txt exists, but the install reads from pyproject.toml). This is separate from the uv/pip question and may be worth a quick look to ensure the intent is correct.
I apologize for the incorrect original comment.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
| if [ -n "${compute_platform:-}" ]; then | ||
| PIP_TORCH_INDEX_URL="${pip_torch_index_url_base}/${compute_platform}" | ||
| pip_install_args+=("--index-url" "$PIP_TORCH_INDEX_URL") | ||
| fi |
There was a problem hiding this comment.
nightly/test may silently miss the intended index.
If no CUDA/ROCm/CPU backend is inferred, no index URL is set, so nightly/test can resolve from default PyPI instead of the nightly/test channel.
Suggested fix
if [ -n "${compute_platform:-}" ]; then
PIP_TORCH_INDEX_URL="${pip_torch_index_url_base}/${compute_platform}"
pip_install_args+=("--index-url" "$PIP_TORCH_INDEX_URL")
+ elif [ "$pip_build" != "release" ]; then
+ # Ensure nightly/test always use the intended channel
+ pip_install_args+=("--index-url" "${pip_torch_index_url_base}/cpu")
fi
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ -n "${compute_platform:-}" ]; then | |
| PIP_TORCH_INDEX_URL="${pip_torch_index_url_base}/${compute_platform}" | |
| pip_install_args+=("--index-url" "$PIP_TORCH_INDEX_URL") | |
| fi | |
| if [ -n "${compute_platform:-}" ]; then | |
| PIP_TORCH_INDEX_URL="${pip_torch_index_url_base}/${compute_platform}" | |
| pip_install_args+=("--index-url" "$PIP_TORCH_INDEX_URL") | |
| elif [ "$pip_build" != "release" ]; then | |
| # Ensure nightly/test always use the intended channel | |
| pip_install_args+=("--index-url" "${pip_torch_index_url_base}/cpu") | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/devinstall_torch.sh` around lines 167 - 170, The current block only
sets PIP_TORCH_INDEX_URL when compute_platform is non-empty, which lets
"nightly"/"test" builds fall back to PyPI; update the logic around
compute_platform/PIP_TORCH_INDEX_URL so that when the requested channel is a
non-stable one (detect via TORCH_VERSION containing "nightly" or "test" or a
TORCH_CHANNEL env var) you still set PIP_TORCH_INDEX_URL (use
pip_torch_index_url_base with compute_platform if present, otherwise use
pip_torch_index_url_base alone) and push it into pip_install_args; change the
code around the compute_platform check that currently sets PIP_TORCH_INDEX_URL
and pip_install_args to handle the empty-compute_platform case for nightly/test
to avoid resolving from default PyPI.
| "INSTALL_JUPYTER" | ||
| "INSTALL_LLVM" | ||
| "INSTALL_TOOLS" | ||
| "INSTALL_TORCH" |
There was a problem hiding this comment.
do we also need these here as well?
TORCH_GITREF
PIP_TORCH_VERSION
PIP_TORCHVISION_VERSION
PIP_TORCHAUDIO_VERSION
PIP_TORCH_INDEX_URL
There was a problem hiding this comment.
Yes, this is a miss from the claude assistance. It separated everything very harshly and missed a ton of the details. I should have provided better guidance and feedback during the process. Currently, those things are not even passed in from the Makefile, but when we do the runtime rework they will be added. I'll add these as they should be there for other uses, i.e. direct podman run setups.
Signed-off-by: Craig Magina <cmagina@redhat.com>
Signed-off-by: Craig Magina <cmagina@redhat.com>
…form Signed-off-by: Craig Magina <cmagina@redhat.com>
Signed-off-by: Craig Magina <cmagina@redhat.com>
Signed-off-by: Craig Magina <cmagina@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/devinstall_triton.sh`:
- Around line 108-116: The current conditional block incorrectly treats
INSTALL_TORCH=skip as falling through to install release; update the logic
around INSTALL_TORCH so "skip" is explicitly honored: modify the if/elif chain
checking INSTALL_TORCH (the block using INSTALL_TORCH, the string
"source"/"skip" comparisons and the call to devinstall_torch) to first check if
INSTALL_TORCH == "skip" and do nothing (or explicitly echo skipping), then
handle "source" and non-empty non-skip values (call devinstall_torch with the
specified value), and finally default to devinstall_torch release only when
INSTALL_TORCH is unset/empty.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4dac08d3-0329-4dbc-8204-605ef75b2566
📒 Files selected for processing (3)
scripts/devinstall_torch.shscripts/devinstall_triton.shscripts/devsetup.sh
✅ Files skipped from review due to trivial changes (1)
- scripts/devinstall_torch.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/devsetup.sh
Summary
This PR is part 3 of 11 in the rework modernization effort. It adds the PyTorch setup script as part of the modular framework installation architecture.
Changes:
scripts/setup_torch.shfor PyTorch installation and configurationFeatures:
INSTALL_TORCHenvironment variable:source: Build from source (with auto-download if not mounted)release,nightly,test: Install wheels from PyPIskip: Skip PyTorch installationWhy this change?
PyTorch is a core framework used by Triton, vLLM, and other projects. This dedicated setup script:
Testing:
Dependencies:
Related PRs:
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores